feat[layout]: list-of-struct nested projection#6012
feat[layout]: list-of-struct nested projection#6012godnight10061 wants to merge 17 commits intovortex-data:developfrom
Conversation
|
I wasn't able to apply labels from a fork (permission denied). Could a maintainer please add the ix label for the changelog/CI gating? |
| // List-of-struct field access: `$.items.a` where `items: list<struct{a,b}>`. | ||
| match input.dtype() { |
There was a problem hiding this comment.
We don't really want to tie the get_item to have this for_each element behaviour.
I am not sure we want get_item to pierce thought lists, we would want a this to be map_list(get_item("a", _), get_item("items", root()) I am not sure we have designed the map_list or the _ expression yet?
|
This is a great contribution, however we need to design the list expression more carefully |
Codecov Report❌ Patch coverage is ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Started a discussion: #6030, we can reopen once we find a good answer there. Maybe you can expand on your use case there. |
8a8803c to
11f06b6
Compare
| row_range: &Range<u64>, | ||
| splits: &mut BTreeSet<u64>, | ||
| ) -> VortexResult<()> { | ||
| splits.insert(row_range.end); |
There was a problem hiding this comment.
The one thing I struggled with the last time I tried to do ListLayout is that it's hard to split naturally.
Ideally you'd have a way to split based on the elements, then turn that back into row ranges...
d5a8576 to
87d9553
Compare
| fn propagate_nullability(parent_nullability: Nullability, field_dtype: DType) -> DType { | ||
| if matches!( | ||
| (parent_nullability, field_dtype.nullability()), | ||
| (Nullability::Nullable, Nullability::NonNullable) | ||
| ) { | ||
| return field_dtype.with_nullability(Nullability::Nullable); | ||
| } | ||
|
|
||
| field_dtype | ||
| } | ||
|
|
There was a problem hiding this comment.
do we need these changes?
| fn propagate_nullability(parent_nullability: Nullability, field_dtype: DType) -> DType { | ||
| if matches!( | ||
| (parent_nullability, field_dtype.nullability()), | ||
| (Nullability::Nullable, Nullability::NonNullable) | ||
| ) { | ||
| return field_dtype.with_nullability(Nullability::Nullable); | ||
| } | ||
|
|
||
| field_dtype | ||
| } |
| field_dtype | ||
| } | ||
|
|
||
| impl VTable for GetItemList { |
There was a problem hiding this comment.
make sure its clear this should never be seralized and relied on.
| (element_dtype.as_ref(), *list_nullability, Some(*list_size)) | ||
| } | ||
| _ => { | ||
| return Err(vortex_err!( |
|
|
||
| match input.dtype() { | ||
| DType::List(..) => { | ||
| let list = input.to_listview(); |
There was a problem hiding this comment.
please use execute, I think needs to be updated too
| let elements = list.elements(); | ||
| let elements = elements.to_struct(); | ||
|
|
||
| let field = elements.field_by_name(field_name).cloned()?; | ||
| let field = match elements.dtype().nullability() { | ||
| Nullability::NonNullable => field, | ||
| Nullability::Nullable => mask(&field, &elements.validity_mask().not())?, | ||
| }; |
There was a problem hiding this comment.
can we share this between list and fsl
| pb::GetItemOpts { | ||
| path: field_name.to_string(), | ||
| } | ||
| .encode_to_vec(), |
There was a problem hiding this comment.
this should panic with an error message
| let scan_projection = scan_projection | ||
| .optimize_recursive(vxf.dtype()) | ||
| .map_err(|e| { | ||
| exec_datafusion_err!( | ||
| "Couldn't simplify the underlying Vortex scan projection: {e}" | ||
| ) | ||
| })?; | ||
| let scan_dtype = scan_projection.return_dtype(vxf.dtype()).map_err(|e| { | ||
| exec_datafusion_err!("Couldn't get the dtype for the underlying Vortex scan: {e}") |
| col: &Expression, | ||
| scope_dtype: &DType, | ||
| ) -> VortexResult<Option<Expression>> { | ||
| let col = col.optimize_recursive(scope_dtype)?; |
| _ => None, | ||
| }) | ||
| .expect("items layout"); | ||
| assert_eq!(items_layout.encoding_id().as_ref(), "vortex.list"); |
There was a problem hiding this comment.
can we do an assert_array_eq! that the result in what is expected?
| // | ||
| // NOTE: `vortex.get_item_list` is a temporary list-of-struct projection expression; | ||
| // when pushing down we construct the element projection and pass it into the elements reader. | ||
| let (is_pushdown, element_expr) = if let Some(field_name) = expr.as_opt::<GetItemList>() |
There was a problem hiding this comment.
you also need to teach the struct layout reader partitioner about this expression
There was a problem hiding this comment.
annotate_scope_access and (/Users/joeisaacs/git/spiraldb/vortex-4/vortex-array/src/expr/transform/partition.rs) otherwise the struct partitioning at the top level will be over cautious
| } else if expr.vtable().id().as_ref() == "vortex.select" { | ||
| (true, expr.clone()) |
There was a problem hiding this comment.
this should never be here, also use the expr id please
| self.lazy_children.get(idx) | ||
| } | ||
|
|
||
| fn list_slice_future( |
6a2a8f7 to
2973c55
Compare
|
@godnight10061 mark this as ready when it is |
a7a6b7d to
da77c0c
Compare
| if expr.is::<GetItemList>() | ||
| && let Some(field_name) = expr.child(0).as_opt::<GetItem>() | ||
| && expr.child(0).child(0).is::<Root>() | ||
| { |
There was a problem hiding this comment.
can you justify this?
76f2c73 to
830efb6
Compare
CodSpeed Performance ReportMerging this PR will degrade performance by 33.07%Comparing
|
That's a brutal regression. -33% is rough. The get_item_list overhead is definitely killing the vectorization. I think we need to pivot: instead of wrapping this, we should push the projection directly into the list reader's child scans. I'll check the flamegraphs to confirm. |
830efb6 to
cc87962
Compare
6c6326a to
3690d75
Compare
|
The CI was modified again. I am going to close and and I suggest we move this to discussion where we can refine the List Layout carefully |
Summary
Tests
Local (Windows):
Notes:
Fork CI (preflight):
Compatibility
References